Background: SARS-CoV-2 and open systems

The global spread of SARS-Cov2 and the abrupt rise of the pandemic flag by the World Health Organisation only one month after the first SARS-CoV-2(+) cases in Wuhan city, China (Sohrabi et al. 2020). However, the up-to-date statistics of the World Health Organization, shows that the northern hemisphere has been the most affected, highlighting the role of human mobility, or population kinetics, as the crucial feature to manage the local and global spread (Kraemer et al. 2020; Chinazzi et al. 2020; Vespignani 2012). Although migration between countries has been actively controlled, within-country migration still running in countries without an effective lockdown maintaining the spread of the virus with an increasing number of positive confirmed cases (Fig. 1). The mobility between territorial administrative units, such as countries, cities or districts, highlights their openness to human mobility, hence to the spread of the current coronavirus at a global scale. Such openness implies that the spread of the virus not only emerges from local infections, but also from human mobility from the focal area to others.

A formal mathematical approach to tackle such open systems has been developed in ecology under the consideration of birth-death processes underlying the dynamics of the focal system (Wright 1937; Marquet et al. 2017). For instance, for species diversity, the birth-death processes correspond to the events of speciation/inmigration and local extinction, respectively. Marquet et al. (2017) developed a model for species proportional abundance distributions under an open system- stochastic- paradigm; by modelling through a diffusion approach the species proportional abundance distribution (SPAD) and showing that the SPAD converges to a standard beta distribution (Eq. (1)). This approach is consistent with previous approaches at another biological scale, where the frequency of genes in a metapopulation system also converges to the standard beta distribution (Wright 1937, 1931). The consideration of the proportional abundance as the observable of a focal system, let us call it \(x\) for simplicity, as a continuous variable allows us to fit a beta distribution whose probability density function \(f(\cdot)\) is a power function of the observable x with two parameters (\(\alpha\),\(\beta\)) with the form

\[\begin{equation} f(x) = \dfrac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \tag{1} \end{equation}\]

Where \(\Gamma\) is the gamma function. Further, Marquet et al. showed that the estimated parameters describing the beta distribution (\(\alpha\),\(\beta\)), can be biologically interpreted as the role played by immigration to the focal system of new or already present species in the focal system. For SARS-CoV-2, the estimated parameters of the beta distribution recapitulate the birth-death processes of the infection. The dynamic in the coronavirus proportinal abundance (Fig. 2) encompasses the kinetics from infected individuals to recovered/dead individuals in a local population open to the migration of individuals from/to neighbour populations. The study of the coronavirus pandemic, and its statistical description, contribute to the understanding of the spread of emerging diseases in a complex and entangled world. Here, by analysing the public database gathered by the Johns Hopkins University; we seek to statistically characterise the spread of SARS-CoV-2 over the world by analysing the spatiotemporal behaviour of the descriptors of the proportional abundance of SARS-CoV-2 by country and through the time.

The CPAD R package

To automatise the data graphics and analyses, we developed a package called CPAD, after Coronavirus Proportional Abundance Distribution. This package constains eight functions that will help the user to download up-to-date open access data maintaned by Johns Hopkins University, analyse it, and plot it.Here, we will briefly overview each function, its arguments and outputs with the data obtained to the 02 of May 2020.

Loading data: load_globalCOVID()

The function load_globalCOVID() allows to access and consolidate the global data of active SARS-CoV-2 cases stored and updated by Johns Hopkins University.The function does not have arguments.

#List global environment
ls()
#> [1] "encoding"   "input_file" "out_dir"

#Load data
load_globalCOVID()

#Verify the global environment
ls()
#> [1] "encoding"   "input_file" "out_dir"

This function returns different objects to your Global Environment:

  • data_deaths: dataframe. The number of global deaths by day due to COVID.
  • data_positives: dataframe. The number of SARS-CoV-2(+).
  • data_recovered: dataframe. The number of recovered from SARS-CoV-2(+).
  • inc_matrix: dataframe. The global active cases by country and day.
  • propab_matrix: dataframe The global active cases/country population by country and day.
  • inc_df: dataframe. A three column dataframe with country, date and the number of active cases.
  • propab_df: dataframe. A three column dataframe with country, date and the proportional abundance of active cases.
  • sumCases: numeric vector. The global sum of SARS-CoV-2(+) by day.
  • avCases: numeric vector. The global average of SARS-CoV-2(+) by day.

Plot global trajectories: plot_trajectories()

This function plots the temporal tendencies in #SARS-CoV-2(+) and its proportional abundance each log-transformed data and insets with non-transformed data. This function have the following arguments:

  • inc.df: dataframe. The incidence dataframe.
  • propab.df: dataframe. The proportional abundance dataframe.
  • plot.inc: TRUE or FALSE. Plot incidence trajectory. Default TRUE.
  • plot.propab: TRUE or FALSE. Plot proportional abundance trajectory. Default TRUE.
  • saveplots: TRUE or FALSE. Save the plots in your wd.Default TRUE.
  • saveplots.ext: character. The extension for the saved figures admitted by ggsave (e.g., “.png”, “.svg”). Default ".png".

Then, for example, we can call the function

#Only plots global incidence
plot_trajectories(inc.df=inc_df, propab.df=propab_df, plot.inc=TRUE, plot.propab=FALSE, saveplots= FALSE, saveplots.ext= ".png")
Temporal dynamics of global SARS-CoV-2 active cases. The main plot shows the log-transformed data while the inset shows the non-transformed dynamics.Colour in the main plot varies according to time.

Figure 1: Temporal dynamics of global SARS-CoV-2 active cases. The main plot shows the log-transformed data while the inset shows the non-transformed dynamics.Colour in the main plot varies according to time.

This function returns to your global environment an object list named plotTraj storing the plot(s) created. Also, if saveplots is TRUE, a folder named plots is created in your wd. Alternatively, we can view the evolution of SARS-CoV-2 proportional abundance by setting plot.propab=TRUE.

#Only plots global proportional abundance
plot_trajectories(inc.df=inc_df, propab.df=propab_df, plot.inc=FALSE, plot.propab=TRUE, saveplots= FALSE, saveplots.ext= ".png")
Temporal dynamics of global proportional abundance of SARS-CoV-2 active cases. The main plot shows the log-transformed data while the inset shows the non-transformed dynamics. Colour in the main plot varies according to time.

Figure 2: Temporal dynamics of global proportional abundance of SARS-CoV-2 active cases. The main plot shows the log-transformed data while the inset shows the non-transformed dynamics. Colour in the main plot varies according to time.

Analysing the coronavirus proportional abundance: analyse_CPAD()

This function analyses the coronavirus proportional abundance distribution (CPAD) and requires the following arguments:

  • propab.matrix: dataframe. The global active cases/country population (proportional abundance) by country and day.
  • propab.df: dataframe. The proportional abundance dataframe of three columns: Country, Day, and propAb.
  • sumCases: numeric vector. The daily sum of active SARS-CoV-2 cases; its length must be equal that the temporal slices present in propab.df.
  • avCases: numeric vector. The daily average of active SARS-CoV-2 cases; its length must be equal that the temporal slices present in propab.df.
  • saveplots: TRUE or FALSE. Save plots related to the distributions in your wd. Default TRUE

Then, for example, we can run

#In this case plots are not saved. The plots are histograms and Kurtosis/Skewness plots.
analyse_CPAD(propab.matrix=propab_matrix, propab.df=propab_df, sumCases=sumCases, avCases=avCases, saveplots= FALSE)

The function returns the CPAD dataframe to your Global Environment with the corresponding statistics for each day. If saveplots is TRUE, the plots folder is created in your wd, containing different folders with plots. The parameters were estimated via maximum likelihood implemented by the fitdistrplus package (version 1.0-14) with the Nelder-Mead optimisation algorithm (Box-Steffensmeier et al. (2014)).

The following functions use the output generated by analyse_CPAD()

Plot the temporal dynamics of the estimates \(\hat{\alpha}\) and \(\hat{\beta}\): plot_betatraj()

This function plots the temporal evolution of the estimated parameters of the beta distribution \(\hat\alpha\) and \(\hat\beta\). Its arguments are

  • CPAD: dataframe. The dataframe returned by analyse_CPAD.
  • saveplots:TRUE or FALSE. Save the plots in your wd.Default TRUE.
  • saveplots.ext:character. The extension for the saved figures admitted by ggplot2::ggsave (e.g., “.png”, “.svg”).Default “.png”.

The, we can run

#This function only prints a static plot in your Plot Panel
plot_betatraj(CPAD, saveplot=FALSE, saveplot.ext= ".png")
Temporal trajectory of the beta distribution estimates.

Figure 3: Temporal trajectory of the beta distribution estimates.

This function return to your global environment an object list called plotbetatraj with the plot(s) created. Also, if saveplot is TRUE, a folder named betaplots in the folder plots is created in your wd.

Visualise an animated plot of the beta distirbution estimates: plot_AnimatedBetaEst()

This function plots the animated temporal evolution of the estimated parameters of beta distribution \(\hat\alpha\) and \(\hat\beta\). It receives the arguments:

  • CPAD: dataframe. The dataframe returned by analyse_CPAD.
  • logScale: TRUE or FALSE. Log-transformed estimates.Default TRUE.
  • saveAnim: TRUE or FALSE. Save the animation.Default FALSE.

This function returns to your Viewer panel the animated temporal evolution of \(\hat\alpha\) and \(\hat\beta\) . Also, if saveAnim is TRUE, a folder named GIFs is created in your wd with the animation in gif format.


plot_AnimatedBetaEst(CPAD, logScale=TRUE, saveAnim=FALSE)
Temporal trajectory of the beta distribution estimates in motion.

Figure 4: Temporal trajectory of the beta distribution estimates in motion.

Detecting single discontinuities in the dynamics of \(\hat{\alpha}\) and \(\hat{\beta}\): analyseplot_shift()

This function calculates and plots a single shift in the temporal evolution of the estimated parameters of beta distribution \(\hat\alpha\) and \(\hat\beta\). The estimation of the single transition in the temporal trajectories of \(\hat\alpha\) and \(\hat\beta\) is performed via minimisation of the residual squared sum (RSS) (Bai (1994)). the arguments of this function are:

  • CPAD: dataframe. The dataframe returned by analyse_CPAD.
  • saveplot: TRUE or FALSE. Save the plots in your wd. Default TRUE.
  • saveplot.ext: character. The extension for the saved figures admitted by ggplot::ggsave (e.g., “.png”, “.svg”).Default “.png”.

This function returns to your global environment an object list named plotbetatraj with the plot(s) created. Also, if saveplot is TRUE, a folder named betaplots in the folder plots is created in your wd. Also, this function returns a numeric object dateshift to your Global Environment, it is the estimated date of the single transition.


analyseplot_shift(CPAD, saveplot=FALSE, saveplot.ext= ".png")
Detection of a single shifting point in the time series through the minimisiiton of the residual squared sum (RSS) for the estimated parameters (A-B). The analyses show the shift occurring the 28th of February (red segmented line). (C) The mean proportional abundance over time (log-scale) with the projected tipping point from (A-B).

Figure 5: Detection of a single shifting point in the time series through the minimisiiton of the residual squared sum (RSS) for the estimated parameters (A-B). The analyses show the shift occurring the 28th of February (red segmented line). (C) The mean proportional abundance over time (log-scale) with the projected tipping point from (A-B).


#The estimated date when the shifing point occurred
print(dateshift)
#> [1] "2020-02-28"

Plot global dynamics and beta estimates relations: plot_relations()

This function plots the relations between global dynamic indicators of the total number of active cases (#SARS-CoV-2(+)) and the number of countries with active cases (#Countries) against the beta distribution estimates \(\hat\alpha\) and \(\hat\beta\). Its arguments are:

  • CPAD: dataframe. The dataframe returned by analyse_CPAD.
  • saveplots: TRUE or FALSE. Save the plots in your wd. Default TRUE.
  • saveplots.ext: character. The extension for the saved figures admitted by ggsave (e.g., “.png”, “.svg”). Default “.png”.

plot_relations(CPAD, saveplot=FALSE, saveplot.ext= ".png")
Associations between the estimated parameters and the epidemiological data. (A) The global number of active SARS-CoV-2(+) cases and the estimated parameter a and (B) the number of countries with active cases and ß. Main plots show the log-transformed values of the estimated parameters, and each inset show the corresponding non-transformed estimates.

Figure 6: Associations between the estimated parameters and the epidemiological data. (A) The global number of active SARS-CoV-2(+) cases and the estimated parameter a and (B) the number of countries with active cases and ß. Main plots show the log-transformed values of the estimated parameters, and each inset show the corresponding non-transformed estimates.

This function returns to your global environment an object list named plotrelations with the plot(s) created. Also, if saveplot is TRUE, a folder named betaplots in the folder plots is created in your wd.

Visualise the relationship between the ratio of countries infected and E|prop abundance|: plot_ratioCountries()

This function plots the relations between the fraction of countries with active SARS-CoV-2 vs sum #SARS-CoV-2, E|#SARS-CoV-2|, and E|prop ab|.Its arguments are:

  • CPAD: dataframe. The dataframe returned by analyse_CPAD.
  • saveplots: TRUE or FALSE. Save the plots in your wd.Default TRUE.
  • saveplots.ext: character. The extension for the saved figures admitted by ggsave (e.g., “.png”, “.svg”).Default “.png”.

This function returns to your global environment an object list named plotratioCountries with the plot(s) created. Also, if saveplot is TRUE, a folder named betaplots in the folder plots is created in your wd.


plot_ratioCountries(CPAD, saveplot=FALSE, saveplot.ext= ".png")
Relationship between the fraction of countries with at least one active case and (A) he mean proportional abundance, (B) the number of active SARS-CoV-2(+) cases, and (C) the mean number of active cases.

Figure 7: Relationship between the fraction of countries with at least one active case and (A) he mean proportional abundance, (B) the number of active SARS-CoV-2(+) cases, and (C) the mean number of active cases.

Findings and significance

We have found that the temporal dynamics of the active cases , starting in January 2020, follow a non-linear behaviour either studying the global number of infected people or the CPAD with some striking bursts by the end of January and the end of February (Figs. 1 and 2). After mid- April it is observed a decreasing in the number of active cases in some countries. It is a consistent pattern either for the number of SARS-Cov-2 (Fig. 1) as for the proportional abundance of SARS-Cov-2 per country (Fig. 2).

From the beginning, the parameters decreasing trend. However, from the 14th of April the estimated parameters show a sustained increase (Figs. 3A-C and 4) with the consequent decrease in mean SARS-CoV-2(+) proportional abundance and its variance (Fig 3D-E). In detail, the statistical description of the dynamic enabled by the beta distribution fit on the CPAD shows that the evolution of the estimated parameters is non-linear and discontinuous (Fig. Fig 3). Particular dynamics show a sudden shift in the estimated parameters by the end of February (Fig. 3B-C), and the consequent modification of the mean SARS-CoV-2(+) proportional abundance and its variance (Fig 3D-E); respectively.

The estimation of the single breakpoint shows that the transition occurred on the 28th of February (Fig 5A-B), being consistent with a transition in the average proportional abundance (Fig 5C).The shift in the estimated parameters is associated with a transition in the global number of active SARS-CoV-2 and the number of countries reporting active cases (Fig. 6). In particular, after the transition in \(\hat{\alpha}\), the number of active cases explode (Fig 6A). A similar pattern observed for \(\hat{\beta}\) and the number of countries with active cases (Fig. 6B).

Interestingly, to that day, 48 out of the 169 countries found in the database (ca.30%) have al least one active case (Fig. 7). That percentage is consistent with reports showing that the percolation threshold of the plague, Yersinia pestis, in a network model occurs at a fraction of infected host about 0.3 (Davis et al. 2008, 2004). And, as we show, at that fraction occurs the transition in the average proportional abundance of SAR-CoV-2(+) cases (Fig 7A) which cannot be deduced without the consideration of spatial and discrete units (Fig. 7B-C).

Supplementary materials

Methods

The global scale data was obtained from an open repository maintained by the Johns Hopkins University Center for Systems Science and Engineering. These records allow estimating the CPAD daily since the 22nd of January of 2020 for 169 countries. Daily temporal slices correspond to metapopulation of focal systems (countries). The observable is the number of active SARS-Cov-2(+) cases divided by the total population of the respective country; we call this quantity coronavirus proportional abundance, and we analyse its distribution (CPAD). For each CPAD corresponding to each time slice, we fit a standard beta distribution of the form

\[\begin{equation} f(x) = \dfrac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \end{equation}\]

Where \(\Gamma\) is a gamma function and (\(\alpha\) and \(\beta\)) are the parameters of the beta distribution. The parameters were estimated via maximum likelihood implemented in R (version 3.6.2) by the fitdistrplus package (version 1.0-14) with the Nelder-Mead optimisation algorithm (Delignette-Muller and Dutang 2015). The estimated parameters allow quantifying the expectancy of the proportional abundance and its variance for each time slice as \[\begin{equation} {E|prop ab|}^t= \dfrac{{\hat{\alpha}}_t}{({\hat{\alpha}}_t+{\hat{\beta}}_t )} \end{equation}\]

\[\begin{equation} {Var|prop ab|}^t=\dfrac{({\hat{\alpha}}_t {\hat{\beta}}_t)}{/((1+{\hat{\alpha}}_t+{\hat{\beta}}_t ) {({\hat{\alpha}}_t+{\hat{\beta}}_t )}^2 )} \end{equation}\]

Where \({\hat{\alpha}}_t\) and \({\hat{\beta}}_t\) are the estimated parameters for the time slice \(t\). Once obtained the temporal trajectory of the estimated parameters, for each time series we estimated the presence of a single breakpoint k ̂ through the least- squared method for the minimisation of the residual squared sum (RSS) (Bai 1994; Box-Steffensmeier et al. 2014). For a series \(y_t=μ(t)+z_t\) with \(t=0,1,2,3,…,T\), where \(μ(t)\) is non-stochastic function in time and \(z_t\) is a linear stochastic process. To estimate the location of a single structural break \(\hat{\kappa}\)in the time series of y_t seeks to solve

\[\begin{equation} \hat{\kappa}=argmin_\kappa\big(\sum_{t=0}^\kappa{(y_t-\bar{y}_\kappa)}^2 + \sum_{t=\kappa+1}^T{(y_t-\bar{y}_\kappa^*)}^2 \big) \end{equation}\]

Where \(\bar{y}_\kappa\) denotes the mean in the observations for \(t=0,…,k\) and \(\bar{y}_\kappa^*\) denotes the mean in the observations for \(t=(k+1),…,T\).

Literature cited

Bai, Jushan. 1994. “Least Squares Estimation of a Shift in Linear Processes.” Journal of Time Series Analysis 15 (5): 453–72. https://doi.org/10.1111/j.1467-9892.1994.tb00204.x.

Box-Steffensmeier, Janet M., John R. Freeman, Matthew P. Hitt, and Jon C. W. Pevehouse. 2014. Time Series Analysis for the Social Sciences. Cambridge University Press.

Chinazzi, Matteo, Jessica T. Davis, Marco Ajelli, Corrado Gioannini, Maria Litvinova, Stefano Merler, Ana Pastore y Piontti, et al. 2020. “The Effect of Travel Restrictions on the Spread of the 2019 Novel Coronavirus (COVID-19) Outbreak.” Science, March. https://doi.org/10.1126/science.aba9757.

Davis, Stephen, Mike Begon, Luc De Bruyn, Vladimir S. Ageyev, Nikolay L. Klassovskiy, Sergey B. Pole, Hildegunn Viljugrein, Nils Chr Stenseth, and Herwig Leirs. 2004. “Predictive Thresholds for Plague in Kazakhstan.” Science 304 (5671): 736–38. https://doi.org/10.1126/science.1095854.

Davis, S., P. Trapman, H. Leirs, M. Begon, and J. a. P. Heesterbeek. 2008. “The Abundance Threshold for Plague as a Critical Percolation Phenomenon.” Nature 454 (7204): 634–37. https://doi.org/10.1038/nature07053.

Delignette-Muller, Marie Laure, and Christophe Dutang. 2015. “Fitdistrplus: An R Package for Fitting Distributions.” Journal of Statistical Software 64 (1): 1–34. https://doi.org/10.18637/jss.v064.i04.

Kraemer, Moritz U. G., Chia-Hung Yang, Bernardo Gutierrez, Chieh-Hsi Wu, Brennan Klein, David M. Pigott, Open COVID-19 Data Working Group†, et al. 2020. “The Effect of Human Mobility and Control Measures on the COVID-19 Epidemic in China.” Science, March. https://doi.org/10.1126/science.abb4218.

Marquet, Pablo A., Guillermo Espinoza, Sebastian R. Abades, Angela Ganz, and Rolando Rebolledo. 2017. “On the Proportional Abundance of Species: Integrating Population Genetics and Community Ecology.” Scientific Reports 7 (1): 16815. https://doi.org/10.1038/s41598-017-17070-1.

Sohrabi, Catrin, Zaid Alsafi, Niamh O’Neill, Mehdi Khan, Ahmed Kerwan, Ahmed Al-Jabir, Christos Iosifidis, and Riaz Agha. 2020. “World Health Organization Declares Global Emergency: A Review of the 2019 Novel Coronavirus (COVID-19).” International Journal of Surgery 76 (April): 71–76. https://doi.org/10.1016/j.ijsu.2020.02.034.

Vespignani, Alessandro. 2012. “Modelling Dynamical Processes in Complex Socio-Technical Systems.” Nature Physics 8 (1): 32–39. https://doi.org/10.1038/nphys2160.

Wright, Sewall. 1931. “Evolution in Mendelian Populations.” Genetics 16 (2): 97.

———. 1937. “The Distribution of Gene Frequencies in Populations.” Proceedings of the National Academy of Sciences of the United States of America 23 (6): 307–20. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1076930/.